Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization
نویسندگان
چکیده
Averaging scheme has attracted extensive attention in deep learning as well traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient analysis for strongly convex optimization. Typically, about last iterate gradient descent methods, which referred to individual convergence, fails attain its optimality due existence logarithmic factor. In order remove this factor, we first develop averaging (GDA), general projection-based dual algorithm setting. We further present primal-dual cases (SC-PDA), where primal schemes are simultaneously utilized. prove that GDA yields rate terms output averaging, while SC-PDA derives convergence. Several experiments on SVMs models validate correctness theoretical effectiveness algorithms.
منابع مشابه
Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
Stochastic gradient descent (SGD) is a simple and very popular iterative method to solve stochastic optimization problems which arise in machine learning. A common practice is to return the average of the SGD iterates. While the utility of this is well-understood for general convex problems, the situation is much less clear for strongly convex problems (such as solving SVM). Although the standa...
متن کاملAdding vs. Averaging in Distributed Primal-Dual Optimization
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework...
متن کاملSmooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization
We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent ...
متن کاملEfficient Stochastic Gradient Descent for Strongly Convex Optimization
We motivate this study from a recent work on a stochastic gradient descent (SGD) method with only one projection (Mahdavi et al., 2012), which aims at alleviating the computational bottleneck of the standard SGD method in performing the projection at each iteration, and enjoys an O(log T/T ) convergence rate for strongly convex optimization. In this paper, we make further contributions along th...
متن کاملMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(log(T )/T ), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T ) rate. This mig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i11.17183